Towards Knowledge Discovery from the Vatican Secret Archives. In Codice Ratio – Episode 1: Machine Transcription of the Manuscripts
نویسندگان
چکیده
In Codice Ratio is a research project to study tools and techniques for analyzing the contents of historical documents conserved in the Vatican Secret Archives (VSA). In this paper, we present our eorts to develop a system to support the transcription of medieval manuscripts. e goal is to provide paleographers with a tool to reduce their eorts in transcribing large volumes, as those stored in the VSA, producing good transcriptions for signicant portions of the manuscripts. We propose an original approach based on character segmentation. Our solution is able to deal with the dirty segmentation that inevitably occurs in handwrien documents. We use a convolutional neural network to recognize characters and language models to compose word transcriptions. Our approach requires minimal training eorts, making the transcription process more scalable as the production of training sets requires a few pages and can be easily crowdsourced. We have conducted experiments on manuscripts from the Vatican Registers, an unreleased corpus containing the correspondence of the popes. With training data produced by 120 high school students, our system has been able to produce good transcriptions that can be used by paleographers as a solid basis to speedup the transcription process at a large scale.
منابع مشابه
In Codice Ratio: Scalable Transcription of Historical Handwritten Documents
Huge amounts of handwritten historical documents are being published by digital libraries world wide. However, for these raw digital images to be really useful, they need to be annotated with informative content. State-of-the-art Handwritten Text Recognition (HTR) approaches require an impressive training effort by expert paleographers. Our contribution is a scalable, end-to-end transcription w...
متن کاملDesigning an Ontology for Knowledge Discovery in Iran’s Vaccine
Ontology is a requirement engineering product and the key to knowledge discovery. It includes the terminology to describe a set of facts, assumptions, and relations with which the detailed meanings of vocabularies among communities can be determined. This is a qualitative content analysis research. This study has made use of ontology for the first time to discover the knowledge of vaccine in Ir...
متن کاملThe role and contribution of Kashan city in the field of Islamic manuscripts of Iran and Iraq
Purpose: Manuscripts, which are considered to be the cultural, scientific and artistic heritage of nations, are important from three aspects of cultural, scientific and artistic, and are considered to be signs of cultural power and scientific development of every country and region. In Iran, most of the manuscripts are in a few cities with a rich civilization history, science and culture, one o...
متن کاملDrug Discovery Acceleration Using Digital Microfluidic Biochip Architecture and Computer-aided-design Flow
A Digital Microfluidic Biochip (DMFB) offers a promising platform for medical diagnostics, DNA sequencing, Polymerase Chain Reaction (PCR), and drug discovery and development. Conventional Drug discovery procedures require timely and costly manned experiments with a high degree of human errors with no guarantee of success. On the other hand, DMFB can be a great solution for miniaturization, int...
متن کاملAdd-on for High Throughput Screening in Material Discovery for Organic Electronics: “Tagging” Molecules to Address the Device Considerations
This work reflects the worth of intelligent modeling in controlling the nanostructure morphology in manufacturing organic bulk heterojunction (BHJ) solar cells. It suggests the idea of screening the pool of material design possibilities inspired by machine learning. To fulfill this goal, a set of experimental data on a BHJ solar cell with a donor structure of diketopyrrolopyrrole (DDP) and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018